blind attack
Blind Attacks on Machine Learners
The importance of studying the robustness of learners to malicious data is well established. While much work has been done establishing both robust estimators and effective data injection attacks when the attacker is omniscient, the ability of an attacker to provably harm learning while having access to little information is largely unstudied. We study the potential of a "blind attacker" to provably limit a learner's performance by data injection attack without observing the learner's training set or any parameter of the distribution from which it is drawn. We provide examples of simple yet effective attacks in two settings: firstly, where an "informed learner" knows the strategy chosen by the attacker, and secondly, where a "blind learner" knows only the proportion of malicious data and some family to which the malicious distribution chosen by the attacker belongs. For each attack, we analyze minimax rates of convergence and establish lower bounds on the learner's minimax risk, exhibiting limits on a learner's ability to learn under data injection attack even when the attacker is "blind".
Reviews: Blind Attacks on Machine Learners
Overall, I find this branch of work quite interesting and am glad the authors are choosing to study this problem. The attacks mentioned in the paper may become feasible in the age of large web-scale datasets or human-in-the-loop training systems, along with the privacy scenario mentioned in the paper. The authors do an excellent job of motivating the problem. The paper appears clearly written if the reader is an expert within the field of statistical decision theory. I must admit that this is not my area of expertise.
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Das, Debeshee, Zhang, Jie, Tramèr, Florian
Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks -- that distinguish the member and non-member distributions without looking at any trained model -- outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model's training data.
Towards the Infeasibility of Membership Inference on Deep Models
Recent studies propose membership inference (MI) attacks on deep models. Despite the moderate accuracy of such MI attacks, we show that the way the attack accuracy is reported is often misleading and a simple blind attack which is highly unreliable and inefficient in reality can often represent similar accuracy. We show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly indistinguishable among train and non-train samples. Moreover, in contrast with general intuition that deeper models have a capacity to memorize training samples, and, hence, they are more vulnerable to membership inference, we find no evidence to support that and in some cases deeper models are often harder to launch membership inference attack on. Furthermore, despite the common belief, we show that overfitting does not necessarily lead to higher degree of membership leakage. We conduct experiments on MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, ResNet, DenseNet, InceptionV3, and Xception.
Blind Attacks on Machine Learners
Beatson, Alex, Wang, Zhaoran, Liu, Han
The importance of studying the robustness of learners to malicious data is well established. While much work has been done establishing both robust estimators and effective data injection attacks when the attacker is omniscient, the ability of an attacker to provably harm learning while having access to little information is largely unstudied. We study the potential of a "blind attacker" to provably limit a learner's performance by data injection attack without observing the learner's training set or any parameter of the distribution from which it is drawn. We provide examples of simple yet effective attacks in two settings: firstly, where an "informed learner" knows the strategy chosen by the attacker, and secondly, where a "blind learner" knows only the proportion of malicious data and some family to which the malicious distribution chosen by the attacker belongs. For each attack, we analyze minimax rates of convergence and establish lower bounds on the learner's minimax risk, exhibiting limits on a learner's ability to learn under data injection attack even when the attacker is "blind".